=> Easier to conduct your Analysis and even so for others!!
Data are heterogeneous in:
Spreadsheets are (still) the primary data entry tool of the digital age!
the Bad:
the Ugly:
=> Encourages you to mix your data and your analaysis
=> A Spreadsheet is not a table !!
Table = Relation = Data set (~ Worksheet)
Column = Variable = Attribute = Characteristic
Row = Record = Tuple <> Observation
Keys are used to Join or Merge
Cell = Value = Measurement
Data Model = Schema
Observations about different entities combined
Observations. A better way to model data is to organize the observations about each type of entity in its own table. This results in:
This is normalized data (aka tidy data)
Variables. In addition, for normalized data, we expect the variables to be organized such that:
When one has normalized data, we often use unique identifiers to reference particular observations, which allows us to link across tables. Two types of identifiers are common within relational data:
An Entity-Relationship model allows us to compactly draw the structure of the tables in a relational database, including the primary and foreign keys in the tables.
In the above model, one can see that each site in the SITES table must have one or more observations in the PLOTOBS table, whereas each PLOTOBS has one and only one SITE.